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Abstract 

With increasing use of mobile devices, photo shar- 
ing services are experiencing greater popularity. Aside 
from providing storage, photo sharing services enable 
bandwidth-efficient downloads to mobile devices by 
performing server- side image transformations (resizing, 
cropping). On the flip side, photo sharing services have 
raised privacy concerns such as leakage of photos to 
unauthorized viewers and the use of algorithmic recog- 
nition technologies by providers. To address these con- 
cerns, we propose a privacy-preserving photo encoding 
algorithm that extracts and encrypts a small, but signif- 
icant, component of the photo, while preserving the re- 
mainder in a public, standards-compatible, part. These 
two components can be separately stored. This technique 
significantly reduces the signal-to-noise ratio and the ac- 
curacy of automated detection and recognition on the 
public part, while preserving the ability of the provider to 
perform server- side transformations to conserve down- 
load bandwidth usage. Our prototype privacy-preserving 
photo sharing system, P3, works with Facebook, and can 
be extended to other services as well. P3 requires no 
changes to existing services or mobile application soft- 
ware, and adds minimal photo storage overhead. 

1 Introduction 

With the advent of mobile devices with high-resolution 
on-board cameras, photo sharing has become highly pop- 
ular. Users can share photos either through photo sharing 
services like Flickr or Picasa, or popular social network- 
ing services like Facebook or Google+. These photo 
sharing service providers (PSPs) now have a large user 
base, to the point where PSP photo storage subsystems 
have motivated interesting systems research 1 10|. 

However, this development has generated privacy con- 
cerns (Section |2]). Private photos have been leaked 
from a prominent photo sharing site fT5 |. Furthermore, 
widespread concerns have been raised about the appli- 
cation of face recognition technologies in Facebook O . 
Despite these privacy threats, it is not clear that the us- 
age of photo sharing services will diminish in the near 
future. This is because photo sharing services provide 
several useful functions that, together, make for a seam- 
less photo browsing experience. In addition to provid- 



ing photo storage, PSPs also perform several server- side 
image transformations (like cropping, resizing and color 
space conversions) designed to improve user perceived 
latency of photo downloads and, incidentally, bandwidth 
usage (an important consideration when browsing photos 
on a mobile device). 

In this paper, we explore the design of a privacy- 
preserving photo sharing algorithm (and an associated 
system) that ensures photo privacy without sacrificing 
the latency, storage, and bandwidth benefits provided by 
PSPs. This paper makes two novel contributions that, to 
our knowledge, have not been reported in the literature 
(Section [6]). First, the design of the P3 algorithm (Sec- 
tion[3]), which prevents leaked photos from leaking infor- 
mation, and reduces the efficacy of automated processing 
(e.g., face detection, feature extraction) on photos, while 
still permitting a PSP to apply image transformations. It 
does this by splitting a photo into a public part, which 
contains most of the volume (in bytes) of the original, and 
a secret part which contains most of the original's infor- 
mation. Second, the design of the P3 system (Section]?]), 
which requires no modification to the PSP infrastructure 
or software, and no modification to existing browsers or 
applications. P3 uses interposition to transparently en- 
crypt images when they are uploaded from clients, and 
transparently decrypt and reconstruct images on the re- 
cipient side. 

Evaluations (Section[5]) on four commonly used image 
data sets, as well as micro-benchmarks on an implemen- 
tation of P3, reveal several interesting results. Across 
these data sets, there exists a "sweet spot" in the param- 
eter space that provides good privacy while at the same 
time preserving the storage, latency, and bandwidth ben- 
efits offered by PSPs. At this sweet spot, the public part 
of the image has low PSNR and algorithms like edge de- 
tection, face detection, face recognition, and SIFT fea- 
ture extraction are completely ineffective; no faces can 
be detected and correctly recognized from the public 
part, no correct features can be extracted, and a very 
small fraction of pixels defining edges are correctly es- 
timated. P3 image encryption and decryption are fast, 
and it is able to reconstruct images accurately even when 
the PSP's image transformations are not publicly known. 

P3 is proof-of-concept of, and a step towards, easily 
deploy able privacy preserving photo storage. Adoption 
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of this technology will be dictated by economic incen- 
tives: for example, PSPs can offer privacy preserving 
photo storage as a premium service offered to privacy- 
conscious customers. 

2 Background and Motivation 

The focus of this paper is on PSPs like Facebook, Picasa, 
Flickr, and Imgur, who offer either direct photo shar- 
ing (e.g., Flickr, Picasa) between users or have integrated 
photo sharing into a social network platform (e.g.. Face- 
book). In this section, we describe some background be- 
fore motivating privacy-preserving photo sharing. 

2.1 Image Standards, Compression and Scalability 

Over the last two decades, several standard image for- 
mats have been developed that enable interoperability 
between producers and consumers of images. Perhaps 
not surprisingly, most of the existing PSPs like Face- 
book, Flickr, Picasa Web, and many websites [71 
primarily use the most prevalent of these standards, the 
JPEG (Joint Photographic Experts Group) standard. In 
this paper, we focus on methods to preserve the privacy 
of JPEG images; supporting other standards such as GIF 
and PNG (usually used to represent computer-generated 
images like logos etc.) are left to future work. 

Beyond standardizing an image file format, JPEG per- 
forms lossy compression of images. A JPEG encoder 
consists of the following sequence of steps: 

Color Space Conversion and Downsampling. In this 
step, the raw RGB or color filter array (CFA) RGB im- 
age captured by a digital camera is mapped to a YUV 
color space. Typically, the two chrominance channels (U 
and V) are represented at lower resolution than the lumi- 
nance (brightness) channel (Y) reducing the amount of 
pixel data to be encoded without significant impact on 
perceptual quality. 

DCT Transformation. In the next step, the image is 
divided into an array of blocks, each with 8x8 pixels, 
and the Discrete Cosine Transform (DCT) is applied to 
each block, resulting in several DCT coefficients. The 
mean value of the pixels is called the DC coefficient. The 
remaining are called AC coefficients. 

Quantization. In this step, these coefficients are quan- 
tized; this is the only step in the processing chain where 
information is lost. For typical natural images, informa- 
tion tends to be concentrated in the lower frequency co- 
efficients (which on average have larger magnitude than 
higher frequency ones). For this reason, JPEG applies 
different quantization steps to different frequencies. The 
degree of quantization is user-controlled and can be var- 
ied in order to achieve the desired trade-off between qual- 
ity of the reconstructed image and compression rate. We 



note that in practice, images shared through PSPs tend to 
be uploaded with high quality (and high rate) settings. 

Entropy Coding. In the final step, redundancy in the 
quantized coefficients is removed using variable length 
encoding of non-zero quantized coefficients and of runs 
of zeros in between non-zero coefficients. 

Beyond storing JPEG images, PSPs perform several 
kinds of transformations on images for various reasons. 
First, when a photo is uploaded, PSPs statically resize the 
image to several fixed resolutions. For example. Face- 
book transforms an uploaded photo into a thumbnail, a 
"small" image (130x130) and a "big" image (720x720). 
These transformations have multiple uses: they can re- 
duce storag^ improve photo access latency for the com- 
mon case when users download either the big or the small 
image, and also reduce bandwidth usage (an important 
consideration for mobile clients). In addition, PSPs per- 
form dynamic (i.e., when the image is accessed) server- 
side transformations; they may resize the image to fit 
screen resolution, and may also crop the image to match 
the view selected by the user. (We have verified, by an- 
alyzing the Facebook protocol, that it supports both of 
these dynamic operations). These dynamic server- side 
transformations enable low latency access to photos and 
reduce bandwidth usage. Finally, in order to reduce user- 
perceived latency further, Facebook also employs a spe- 
cial mode in the JPEG standard, called progressive mode. 
For photos stored in this mode, the server delivers the 
coefficients in increasing order (hence "progressive") so 
that the clients can start rendering the photo on the screen 
as soon as the first few coefficients are received, without 
having to receive all coefficients. 

In general, these transformations scale images in one 
fashion or another, and are collectively called image 
scalability transformations. Image scalability is crucial 
for PSPs, since it helps them optimize several aspects 
of their operation: it reduces photo storage, which can 
be a significant issue for a popular social network plat- 
form 1 10 1; it can reduce user-perceived latency, and re- 
duce bandwidth usage, hence improving user satisfac- 
tion. 

2.2 Threat Model, Goals and Assumptions 

In this paper, we focus on two specific threats to pri- 
vacy that result from uploading user images to PSPs. The 
first threat is unauthorized access to photos. A concrete 
instance of this threat is the practice of fusking, which 
attempts to reverse-engineer PSP photo URLs in order 
to access stored photos, bypassing PSP access controls. 



^We do not know if Facebook preserves the original image, but 
high-end mobile devices can generate photos with 4000x4000 resolu- 
tion and resizing these images to a few small fixed resolutions can save 
space. 
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Figure 1 : Privacy-Preserving Image Encoding Algorithm 

Fusking has been appHed to at least one PSP (Photo- 
bucket), resulting in significant privacy leakage 1 15 1. The 
second threat is posed by automatic recognition tech- 
nologies, by which PSPs may be able to infer social con- 
texts not explicitly specified by users. Facebook's de- 
ployment of face recognition technology has raised sig- 
nificant privacy concerns in many countries (e.g., 1 3 1). 

The goal of this paper is to design and implement a 
system that enables users to ensure the privacy of their 
photos (with respect to the two threats listed above), 
while still benefiting from the image scalability optimiza- 
tions provided by the PSP. 

Implicit in this statement are several constraints, 
which make the problem significantly challenging. The 
resulting system must not require any software changes 
at the PSP, since this is a significant barrier to deploy- 
ment; an important implication of this constraint is that 
the image stored on the PSP must be JPEG-compliant. 
For a similar reason, the resulting system must also be 
transparent to the client. Finally, our solution must not 
significantly increase storage requirements at the PSP 
since, for large PSPs, photo storage is a concern. 

We make the following assumptions about trust in the 
various components of the system. We assume that all lo- 
cal software/hardware components on clients (mobile de- 
vices, laptops etc.) are completely trustworthy, including 
the operating system, applications and sensors. We as- 
sume that PSPs are completely untrusted and may either 
by commission or omission, breach privacy in the two 
ways described above. Furthermore, we assume eaves- 
droppers may attempt to snoop on the communication 
between PSP and a client. 



3 P3: The Algorithm 

In this section, we describe the P3 algorithm for ensuring 
privacy of photos uploaded to PSPs. In the next section, 
we describe the design and implementation of a complete 
system for privacy-preserving photo sharing. 



3.1 Overview 

One possibility for preserving the privacy of photos is 
end-to-end encryption. 5'^^J^r^may encrypt photos be- 
fore uploading, and recipients use a shared secret key 
to decrypt photos on their devices. This approach can- 
not provide image scalability, since the photo represen- 
tation is non-JPEG compliant and opaque to the PSP so 
it cannot perform transformations like resizing and crop- 
ping. Indeed, PSPs like Facebook reject attempts to up- 
load fully-encrypted images. 

A second approach is to leverage the JPEG image 
compression pipeline. Current image compression stan- 
dards use a well-known DCT dictionary when computing 
the DCT coefficients. A private dictionary |9|, known 
only to the sender and the authorized recipients, can be 
used to preserve privacy. Using the coefficients of this 
dictionary, it may be possible for PSPs to perform im- 
age scaling transformations. However, as currently de- 
fined, these coefficients result in a non-JPEG compliant 
bit-stream, so PSP-side code changes would be required 
in order to make this approach work. 

A third strawman approach might selectively hide 
faces by performing face detection on an image before 
uploading. This would leave a JPEG-compliant image 
in the clear, with the hidden faces stored in a separate 
encrypted part. At the recipient, the image can be re- 
constructed by combining the two parts. However, this 
approach does not address our privacy goals completely: 
if an image is leaked from the PSP, attackers can still ob- 
tain significant information from the non-obscured parts 
(e.g., torsos, other objects in the background etc.). 

Our approach on privacy-preserving photo sharing 
uses a selective encryption like this, but has a different 
design. In this approach, called P3, a photo is divided 
into two parts, a public part and a secret part. The pub- 
lic part is exposed to the PSP, while the secret part is 
encrypted and shared between the sender and the recipi- 
ents (in a manner discussed later). Given the constraints 
discussed in Section |2j the public and secret parts must 
satisfy the following requirements: 

» It must be possible to represent the public part as a 
JPEG-compliant image. This will allow PSPs to per- 
form image scaling. 

» However, intuitively, most of the "important" informa- 
tion in the photo must be in the secret part. This would 
prevent attackers from making sense of the public part of 
the photos even if they were able to access these photos. 
It would also prevent PSPs from successfully applying 
recognition algorithms. 

• Most of the volume (in bytes) of the image must reside 
in the public part. This would permit PSP server-side 



^We use "sender" to denote the user of a PSP who uploads images 
to the PSP. 
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image scaling to have the bandwidth and latency benefits 
discussed above. 

• The combined size of the public and secret parts of the 
image must not significantly exceed the size of the orig- 
inal image, as discussed above. 

Our P3 algorithm, which satisfies these requirements, 
has two components: a sender side encryption algorithm, 
and a recipient- side decryption algorithm. 

3.2 Sender-Side Encryption 

JPEG compression relies on the sparsity in the DCT do- 
main of typical natural images: a few (large magnitude) 
coefficients provide most of the information needed to 
reconstruct the pixels. Moreover, as the quality of cam- 
eras on mobile devices increases, images uploaded to 
PSPs are typically encoded at high quality. P3 leverages 
both the sparsity and the high quality of these images. 
First, because of sparsity, most information is contained 
in a few coefficients, so it is sufficient to degrade a few 
such coefficients, in order to achieve significant reduc- 
tions in quality of the public image. Second, because the 
quality is high, quantization of each coefficient is very 
fine and the least significant bits of each coefficient repre- 
sent very small incremental gains in reconstruction qual- 
ity. P3's encryption algorithm encode the most signifi- 
cant bits of (the few) significant coefficients in the secret 
part, leaving everything else (less important coefficients, 
and least significant bits of more important coefficients) 
in the public part. We concretize this intuition in the fol- 
lowing design for P3 sender side encryption. 

The selective encryption algorithm is, conceptually, 
inserted into the JPEG compression pipeline after the 
quantization step. At this point, the image has been 
converted into frequency-domain quantized DCT coef- 
ficients. While there are many possible approaches to 
extracting the most significant information, P3 uses a rel- 
atively simple approach. First, it extracts the DC coeffi- 
cients from the image into the secret part, replacing them 
with zero values in the public part. The DC coefficients 
represent the average value of each 8x8 pixel block of 
the image; these coefficients usually contain enough in- 
formation to represent thumbnail versions of the original 
image with enough visual clarity. 

Second, P3 uses a threshold-based splitting algorithm 
in which each AC coefficient y{i) whose value is above a 
threshold T is processed as follows: 

• If \y{i)\ < T, then the coefficient is represented in the 
public part as is, and in the secret part with a zero. 

• If I j(/) I > r, the coefficient is replaced in the public part 
with r, and the secret part contains the magnitude of the 
difference as well as the sign. 

Intuitively, this approach clips off the significant coef- 
ficients at r. r is a tunable parameter that represents the 
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Figure 2: P3 Overall Processing Chain 

trade-off between storage/bandwidth overhead and pri- 
vacy; a smaller T extracts more signal content into the 
secret part, but can potentially incur greater storage over- 
head. We explore this trade-off empirically in Section |5] 
Notice that both the public and secret parts are JPEG- 
compliant images, and, after they have been generated, 
can be subjected to entropy coding. 

Once the public and secret parts are prepared, the se- 
cret part is encrypted and, conceptually, both parts can be 
uploaded to the PSP (in practice, our system is designed 
differently, for reasons discussed in Section [4]). We also 
defer a discussion of the encryption scheme to Section]?] 

3.3 Recipient-side Decryption and Reconstruction 

While the sender- side encryption algorithm is concep- 
tually simple, the operations on the recipient- side are 
somewhat trickier. At the recipient, P3 must decrypt the 
secret part and reconstruct the original image by com- 
bining the public and secret parts. P3's selective encryp- 
tion is reversible, in the sense that, the public and secret 
parts can be recombined to reconstruct the original im- 
age. This is straightforward when the public image is 
stored unchanged, but requires a more detailed analysis 
in the case when the PSP performs some processing on 
the public image (e.g., resizing, cropping, etc) in order to 
reduce storage, latency or bandwidth usage. 

In order to derive how to reconstruct an image when 
the public image has been processed, we start by express- 
ing the reconstruction for the unprocessed case as a series 
of linear operations. 

Let the threshold for our splitting algorithm be denoted 
T. Let y be a block of DCT coefficients correspond- 
ing to a 8 X 8 pixel block in the original image. De- 
note Xp and Xs the corresponding DCT coefficient values 
assigned to the public and secret images, respectively, 
for the same blocl|^ For example, if one of those co- 
efficients is such that abs{y{i)) > T, we will have that 
Xp{i) = T andx^(/) = sign{y{i)){abs{y{i)) — T). Since in 
our algorithm the sign information is encoded either in 
the public or in the secret part, depending on the coef- 
ficient magnitude, it is useful to explicitly consider sign 
information here. To do so we write Xp = Sp • ap, and 
Xs = Ss • as, where ap and as are absolute values of Xp 
and Xs, Sp and Ss are diagonal matrices with sign infor- 
mation, i.e., Sp = diag{sign{x^)) ^Ss — diag{sign{xs)). 
Now let w[/] = r if Ss[/] ^ 0, where / is a coefficient 



^For ease of exposition, we represent these blocks as 64x1 vectors 
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index, so w marks the positions of the above-threshold 
coefficients. 

The key observation is that Xp and Xs cannot be di- 
rectly added to recover y because the sign of a coef- 
ficient above threshold is encoded correctly only in the 
secret image. Thus, even though the public image con- 
veys sign information for that coefficient, it might not 
be correct. As an example, let y{i) < —T, then we will 
have that Xp{i) = T and Xs{i) = —{abs{y{i)) — T), thus 
Xs{i)-\-Xp{i) ^y{i). For coefficients below threshold, 
can be recovered trivially since Xs (i) =0 and Xp{i) =y{i). 
Note that incorrect sign in the public image occurs only 
for coefficients y{i) above threshold, and by definition, 
for all those coefficients the public value is Xp{i) = T. 
Note also that removing these signs increases signifi- 
cantly the distortion in the public images and makes it 
more challenging for an attacker to approximate the orig- 
inal image based on only the public one. 

In summary, the reconstruction can be written as a se- 
ries of linear operations: 

y = Sp-ap + Ss-as+(Ss-Ss^)-w (1) 

where the first two terms correspond to directly adding 
the correspondig blocks from the public and secret im- 
ages, while the third term is a correction factor to account 
for the incorrect sign of some coefficients in the public 
image. This correction factor is based on the sign of the 
coefficients in the secret image and distinguishes three 
cases. If Xs{i) = or Xs{i) > then y{i) = Xs{i) -\-Xp{i) 
(no correction), while if Xs{i) < we have 

y{i) = Xs{i) -^Xp{i) -2T = Xs{i) + T - 27 = Xs{i) - T. 

Note that the operations can be very easily represented 
and implemented with if/then/else conditions, but the al- 
gebraic representation of ([T]) will be needed to determine 
how to operate when the public image has been subject to 
server- side processing. In particular, from ([T]), and given 
that the DCT is a linear operator, it becomes apparent 
that it would be possible to reconstruct the images in the 
pixel domain. That is, we could convert Sp • ap, Ss • as 
and (Ss — Ss^) • w into the pixel domain and simply add 
these three images pixel by pixel. Further note that the 
third image, the correction factor, does not depend on the 
public image and can be completely derived from the se- 
cret image. 

We now consider the case where the PSP applies a lin- 
ear operator A to the public part. Many interesting image 
transformations such as filtering, cropping scaling (re- 
sizing), and overlapping can be expressed by linear op- 
erators. Thus, when the public part is requested from the 

"^Cropping at 8x8 pixel boundaries is a linear operator; cropping at 
arbitrary boundaries can be approximated by cropping at the nearest 
8x8 boundary. 



PSP, A • Sp • ap will be received. Then the goal is for the 
recipient to reconstruct A • y given the processed public 
image A • Sp • ap and the unprocessed secret information. 
Based on the reconstruction formula of ([T]), and the lin- 
earity of A, it is clear that the desired reconstruction can 
be obtained as follows 

A-y = A-Sp-ap+A-Ss-as + A-(Ss-Ss^)-w (2) 

Moreover, since the DCT transform is also linear, these 
operations can be applied directly in the pixel domain, 
without needing to find a transform domain representa- 
tion. As an example, if cropping is involved, it would be 
enough to crop the private image and the image obtained 
by applying an inverse DCT to (Ss — Ss^) • w. 

We have left an exploration of nonlinear operators to 
future work. It may be possible to support certain types 
of non-linear operations, such as pixel- wise color remap- 
ping, as found in popular apps (e.g., Instagram). If such 
operation can be represented as one-to-one mappings for 
all legitimate value^ e.g. 0-255 RGB values, we can 
achieve the same level of reconstruction quality as the 
linear operators: at the recipient, we can reverse the map- 
ping on the public part, combine this with the unpro- 
cessed secret part, and re-apply the color mapping on 
the resulting image. However, this approach can result 
in some loss and we have left a quantitative exploration 
of this loss to future work. 

3.4 Algorithmic Properties of P3 

Privacy Properties. By encrypting significant signal in- 
formation, P3 can preserve the privacy of images by dis- 
torting them and by foiling detection and recognition al- 
gorithms (Section [5|. Given only the public part, the at- 
tacker can guess the threshold T by assuming it to be the 
most frequent non-zero value. If this guess is correct, 
the attacker knows the positions of the significant coef- 
ficients, but not the range of values of these coefficients. 
Crucially, the sign of the coefficient is also not known. 
Sign information tends to be "random" in that positive 
and negative coefficients are almost equally likely and 
there is very limited correlation between signs of differ- 
ent coefficients, both within a block and across blocks. It 
can be shown that if the sign is unknown, and no prior in- 
formation exists that would bias our guess, it is actually 
best, in terms of mean- square error (MSE), to replace the 
coefficient with unknown sign in the public image by 00 
Finally, we observe that proving the privacy proper- 
ties of our approach is challenging. If the public part is 

^ Often, this is the case for most color remapping operations. 

^If an adversary sees T in the public part, replacing it with will 
have an MSE of T^. However, if we use any non-zero values as a 
guess, MSE will be at least 0.5 x {2T)^ = 2T^ because we will have a 
wrong sign with probability 0.5 and we know that the magnitude is at 
least equal to T. 
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Figure 3: P3 System Architecture 

leaked from the PSP, proving that no human can extract 
visual information from the public part would require 
having an accurate understanding of visual perception. 
Instead, we rely on metrics commonly used in the sig- 
nal processing community in our evaluation (Section [5]). 
We note that the prevailing methodology in the signal 
processing community for evaluating the efficacy of im- 
age and video privacy is empirical subjective evaluation 
using user studies, or objective evaluation using met- 
rics |46|. In Section|5j we resort to an objective metrics- 
based evaluation, showing the performance of P3 on sev- 
eral image corpora. 

Other Properties. P3 satisfies the other requirements 
we have discussed above. It leaves, in the clear, a JPEG- 
compliant image (the public part), on which the PSP can 
perform transformations to save storage and bandwidth. 
The threshold T permits trading off increased storage for 
increased privacy; for images whose signal content is in 
the DC component and a few highly- valued coefficients, 
the secret part can encode most of this content, while the 
public part contains a significant fraction of the volume 
of the image in bytes. As we show in our evaluation later, 
most images are sparse and satisfy this property. Finally, 
our approach of encoding the large coefficients decreases 
the entropy both in the public and secret parts, result- 
ing in better compressibility and only slightly increased 
overhead overall relative to the unencrypted compressed 
image. 

However, the P3 algorithm has an interesting conse- 
quence: since the secret part cannot be scaled (because, 
in general, the transformations that a PSP performs can- 
not be known a priori) and must be downloaded in its 
entirety, the bandwidth savings from P3 will always be 
lower than downloading a resized original image. The 
size of the secret part is determined by T: higher val- 
ues of T result in smaller secret parts, but provide less 
privacy, a trade-off we quantify in Section [5] 

4 P3: System Design 

In this section, we describe the design of a system for 
privacy preserving photo sharing system. This system. 



also called P3, has two desirable properties described 
earlier. First, it requires no software modifications at 
the PSP. Second, it requires no modifications to client- 
side browsers or image management applications, and 
only requires a small footprint software installation on 
clients. These properties permit fairly easy deployment 
of privacy-preserving photo sharing. 

4.1 P3 Architecture and Operation 

Before designing our system, we explored the protocols 
used by PSPs for uploading and downloading photos. 
Most PSPs use HTTP or HTTPS to upload messages; 
we have verified this for Facebook, Picasa Web, Flickr, 
PhotoBucket, Smugmug, and Imageshack. This suggests 
a relatively simple interposition architecture, depicted in 
Figure [3] In this architecture, browsers and applications 
are configured to use a local HTTP/HTTPS proxy and all 
accesses to PSPs go through the proxy. The proxy ma- 
nipulates the data stream to achieve privacy preserving 
photo storage, in a manner that is transparent both to the 
PSP and the client. In the following paragraphs, we de- 
scribe the actions performed by the proxy at the sender 
side and at one or more recipients. 

Sender-side Operation. When a sender transmits the 
photo taken by built-in camera, the local proxy acts as 
a middlebox and splits the uploaded image into a public 
and a secret part (as discussed in Section [3|. Since the 
proxy resides on the client device (and hence is within 
the trust boundary per our assumptions. Section [2]), it is 
reasonable to assume that the proxy can decrypt and en- 
crypt HTTPS sessions in order to encrypt the photo. 

We have not yet discussed how photos are encrypted; 
in our current implementation, we assume the existence 
of a symmetric shared key between a sender and one or 
more recipients. This symmetric key is assumed to be 
distributed out of band. 

Ideally, it would have been preferable to store both the 
public and the secret parts on the PSP. Since the public 
part is a JPEG-compliant image, we explored methods to 
embed the secret part within the public part. The JPEG 
standard allows users to embed arbitrary application- 
specific markers with application-specific data in im- 
ages; the standard defines 16 such markers. We at- 
tempted to use an application- specific marker to embed 
the secret part; unfortunately, at least 2 PSPs (Facebook 
and Flickr) strip all application- specific markers. 

Our current design therefore stores the secret part on a 
cloud storage provider (in our case, Dropbox). Note that 
because the secret part is encrypted, we do not assume 
that the storage provider is trusted. 

Finally, we discuss how photos are named. When a 
user uploads a photo to a PSP, that PSP may transform 
the photo in ways discussed below. Despite this, most 
photo-sharing services (Facebook, Picasa Web, Flickr, 
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Smugmug, and Imageshacl|_J) assign a unique ID for all 
variants of the photo. This ID is returned to the client, as 
part of the API ||22l|24l, when the photo is updated. 

P3's sender side proxy performs the following oper- 
ations on the public and secret parts. First, it uploads 
the public part to the PSP either using HTTP or HTTPS 
(e.g., Facebook works only with HTTPS, but Flickr sup- 
ports HTTP). This returns an ID, which is then used to 
name a file containing the secret part. This file is then 
uploaded to the storage provider. 

Recipient-side Operation. Recipients are also config- 
ured to run a local web proxy. A client device downloads 
a photo from a PSP using an HTTP get request. The 
URL for the HTTP request contains the ID of the photo 
being downloaded. When the proxy sees this HTTP re- 
quest, it passes the request on to the PSP, but also initiates 
a concurrent download of the secret part from the stor- 
age provider using the ID embedded in the URL. When 
both the public and secret parts have been received, the 
proxy performs the decryption and reconstruction pro- 
cedure discussed in Section [3] and passes the resulting 
image to the application as the response to the HTTP 
get request. However, note that a secret part may be 
reused multiple times: for example, a user may first view 
a thumbnail image and then download a larger image. 
In these scenarios, it suffices to download the secret part 
once so the proxy can maintain a cache of downloaded 
secret parts in order to reduce bandwidth and improve 
latency. 

There is an interesting subtlety in the photo recon- 
struction process. As discussed in Section [3] when the 
server-side transformations are known, nearly exact re- 
construction is possibl^ In our case, the precise trans- 
formations are not known, in general, to the proxy, so the 
problem becomes more challenging. 

By uploading photos, and inspecting the results, we 
are able to tell, generally speaking, what kinds of trans- 
formations PSPs perform. For instance, Facebook trans- 
forms a baseline JPEG image to a progressive format and 
at the same time wipes out all irrelevant markers. Both 
Facebook and Flickr statically resize the uploaded image 
with different sizes; for example, Facebook generates at 
least three files with different resolutions, while Flickr 
generates a series of fixed-resolution images whose num- 
ber depends on the size of the uploaded image. We 
cannot tell if these PSPs actually store the original im- 
ages or not, and we conjecture that the resizing serves to 

^PhotoBucket does not, which explains its vulnerability to fusking, 
as discussed earlier 

^The only errors that can arise are due to storing the correction term 
in Section [3] in a lossy JPEG format that has to be decoded for pro- 
cessing in the pixel domain. Even if quantization is very fine, errors 
maybe introduced because the DCT transform is real valued and pixel 
values are integer, so the inverse transform of (Ss — Ss^) w will have to 
be rounded to the nearest integer pixel value. 



limit storage and is also perhaps optimized for common 
case devices. For example, the largest resolution photos 
stored by Facebook is 720x720, regardless of the original 
resolution of the image. In addition, Facebook can dy- 
namically resize and crop an image; the cropping geom- 
etry and the size specified for resizing are both encoded 
in the HTTP get URL, so the proxy is able to determine 
those parameters. Furthermore, by inspecting the JPEG 
header, we can tell some kinds of transformations that 
may have been performed: e.g., whether baseline image 
was converted to progressive or vice a versa, what sam- 
pling factors, cropping and scaling etc. were applied. 

However, some other critical image processing param- 
eters are not visible to the outside world. For example, 
the process of resizing an image using down sampling is 
often accompanied by a filtering step for antialiasing and 
may be followed by a sharpening step, together with a 
color adjustment step on the downsampled image. Not 
knowing which of these steps have been performed, and 
not knowing the parameters used in these operations, the 
reconstruction procedure can result in lower quality im- 
ages. 

To understand what transformations have been per- 
formed, we are reduced to searching the space of possi- 
ble transformations for an outcome that matches the out- 
put of transformations performed by the PSlj^ Note that 
this reverse engineering need only be done when a PSP 
re-jiggers its image transformation pipeline, so it should 
not be too onerous. Fortunately, for Facebook and Flickr, 
we were able to get reasonable reconstruction results on 
both systems (Section |5]). These reconstruction results 
were obtained by exhaustively searching the parameter 
space with salient options based on commonly-used re- 
sizing techniques 1281 . More precisely, we select sev- 
eral candidate settings for colorspace conversion, filter- 
ing, sharpening, enhancing, and gamma corrections, and 
then compare the output of these with that produced by 
the PSP. Our reconstruction results are presented in Sec- 
tion EJ 

4.2 Discussion 

Privacy Properties. Beyond the privacy properties of 
the P3 algorithm, the P3 system achieves the privacy 
goals outlined in Section |2] Since the proxy runs on the 
client for both sender and receiver, the trusted computing 
base for P3 includes the software and hardware device on 
the client. It may be possible to reduce the footprint of 
the trusted computing base even further using a trusted 
platform module |49| and trusted sensors 133J, but we 
have deferred that to future work. 



^ This approach is clearly fragile, since the PSP can change the 
kinds of transformations they perform on photos. Please see the dis- 
cussion below on this issue. 
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P3's privacy depends upon the strength of the sym- 
metric key used to encrypt in the secret part. We assume 
the use of AES-based symmetric keys, distributed out of 
band. Furthermore, as discussed above, in P3 the stor- 
age provider cannot leak photo privacy because the secret 
part is encrypted. The storage provider, or for that mat- 
ter the PSP, can tamper with images and hinder recon- 
struction; protecting against such tampering is beyond 
the scope of the paper. For the same reason, eavesdrop- 
pers can similarly potentially tamper with the public or 
the secret part, but cannot leak photo privacy. 

PSP Co-operation. The P3 design we have described as- 
sumes no co-operation from the PSP. As a result, this im- 
plementation is fragile and a PSP can prevent users from 
using their infrastructure to store P3's public parts. For 
instance, they can introduce complex nonlinear transfor- 
mations on images in order to foil reconstruction. They 
may also run simple algorithms to detect images where 
coefficients might have been thresholded, and refuse to 
store such images. 

Our design is merely a proof of concept that the tech- 
nology exists to transparently protect the privacy of pho- 
tos, without requiring infrastructure changes or signifi- 
cant client-side modification. Ultimately, PSPs will need 
to cooperate in order for photo privacy to be possible, and 
this cooperation depends upon the implications of photo 
sharing on their respective business models. 

At one extreme, if only a relatively small fraction of 
a PSP's user base uses P3, a PSP may choose to benev- 
olently ignore this use (because preventing it would re- 
quire commitment of resources to reprogram their infras- 
tructure). At the other end, if PSPs see a potential loss 
in revenue from not being able to recognize objects/faces 
in photos, they may choose to react in one of two ways: 
shut down P3, or offer photo privacy for a fee to users. 
However, in this scenario, a significant number of users 
see value in photo privacy, so we believe that PSPs will 
be incentivized to offer privacy-preserving storage for a 
fee. In a competitive marketplace, even if one PSP were 
to offer privacy-preserving storage as a service, others 
will likely follow suit. For example, Flickr already has a 
"freemium" business model and can simply offer privacy 
preserving storage to its premium subscribers. 

If a PSP were to offer privacy-preserving photo stor- 
age as a service, we believe it will have incentives to 
use a P3 like approach (which permits image scaling and 
transformations), rather than end to end encryption. With 
P3, a PSP can assure its users that it is only able to see 
the public part (reconstruction would still happen at the 
client), yet provide (as a service) the image transforma- 
tions that can reduce user-perceived latency (which is an 
important consideration for retaining users of online ser- 
vices ifTol ). 



Finally, with PSP co-operation, two aspects of our P3 
design become simpler. First, the PSP image transfor- 
mation parameters would be known, so higher quality 
images would result. Second, the secret part of the im- 
age could be embedded within the public part, obviating 
the need for a separate online storage provider. 

Extensions. Extending this idea to video is feasible, 
but left for future work. As an initial step, it is possi- 
ble to introduce the privacy preserving techniques only 
to the I-frames, which are coded independently using 
tools similar to those used in JPEG. Because other frames 
in a "group of pictures" are coded using an I-frame as 
a predictor, quality reductions in an I-frame propagate 
through the remaining frames. In future work, we plan 
to study video-specific aspects, such as how to process 
motion vectors or how to enable reconstruction from a 
processed version of a public video. 

5 Evaluation 

In this section, we report on an evaluation of P3. Our 
evaluation uses objective metrics to characterize the pri- 
vacy preservation capability of P3, and it also reports, 
using a full-fledged implementation, on the processing 
overhead induced by sender and receiver side encryption. 

5.1 Methodology 

Metrics. Our first metric for P3 performance is the stor- 
age overhead imposed by selective encryption. Photo 
storage space is an important consideration for PSPs, and 
a practical scheme for privacy preserving photo storage 
must not incur large storage overheads. We then measure 
the efficacy of privacy preservation using PSNR (peak 
signal-to-noise ratio), a metric commonly used in sig- 
nal processing. While the shortcomings of this metric in 
terms of quantifying perceptual quality are well known, 
it does provide a simple objective way of quantifying 
degradation. Note also that the public images that are 
highly degraded with values of PSNR will be commonly 
agreed to represent very poor quality. To complement 
PSNR, we also present the visual representation of the 
public part of an image, to let the reader judge the effi- 
cacy of P3; lack of space prevents us from a more de- 
tailed exposition. We then evaluate the efficacy of pri- 
vacy preservation by measuring the performance of state- 
of-the-art edge and face detection algorithms, the SIFT 
feature extraction algorithm, and a face recognition al- 
gorithm on P3. We conclude the evaluation of privacy 
by discussing the efficacy of guessing attacks. Finally, 
we quantify the reconstruction performance, bandwidth 
savings and the processing overhead of P3. 

Datasets. We evaluate P3 using four image datasets. 
First, as a baseline, we use the "miscellaneous" volume 
in the USC-SIPI image dataset 18J. This volume has 
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Figure 4: Screenshot(Facebook) with/without decryption 



44 color and black-and-white images and contains var- 
ious objects, people, scenery, and so forth, and contains 
many canonical images (including Lena) commonly used 
in the image processing community. Our second data 
set is from INRIA [4J, and contains 1491 full color im- 
ages from vacation scenes including a mountain, a river, 
a small town, other interesting topographies, etc. This 
dataset contains has greater diversity than the USC-SIPI 
dataset in terms of both resolutions and textures; its 
images vary in size up to 5 MB, while the USC-SIPI 
dataset's images are all under 1 MB. 

We also use the Caltech face dataset 111 for our face 
detection experiment. This has 450 frontal color face 
images of about 27 unique faces depicted under different 
circumstances (illumination, background, facial expres- 
sions, etc.). All images contain at least one large domi- 
nant face, and zero or more additional faces. Finally, the 
Color FERET Database |2| is used for our face recog- 
nition experiment. This dataset is specifically designed 
for developing, testing, and evaluating face recognition 
algorithms, and contains 11,338 facial images, using 994 
subjects at various angles. 

Implementation. We also report results from an im- 
plementation for Facebook f2T\. We chose the An- 
droid 4.x mobile operating system as our client plat- 
form, since the bandwidth limitations together with the 
availability of camera sensors on mobile devices mo- 
tivate our work. The mitmproxy software tool 1391 is 
used as a trusted man-in-the-middle proxy entity in the 
system. To execute a mitmproxy tool on Android, we 
used the kivy/python-for-android software 1321 . Our al- 
gorithm described in Section [3] is implemented based on 
the code maintained by the Independent JPEG Group, 
version 8d 1291 . We report on experiments conducted 
by running this prototype on Samsung Galaxy S3 smart- 
phones. 

Figure |4] shows two screenshots of a Facebook page, 
with two photos posted. The one on the left is the view 
seen by a mobile device which has our recipient- side de- 
cryption and reconstruction algorithm enabled. On the 
right is the same page, without that algorithm (so only 
the public parts of the images are visible). 





1.4 




1.2 


CD 




_N 


1 






§ 
O 


0.8 


z 

CD 




0.6 


N 




w 






0.4 


LL 






0.2 




Or 







□ Public+Secret 
o Public 
X Secret 
Original 









40 60 
Threshold 




□ Public+Secret 
o Public 
X Secret 
Original 



40 60 80 
Threshold 



(a) USC-SIPI (b) INRIA 
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Figure 6: PSNR results 
5.2 Evaluation Results 

In this section, we first report on the trade-off between 
the threshold parameter and storage size in P3. We 
then evaluate various privacy metrics, and conclude with 
an evaluation of reconstruction performance, bandwidth, 
and processing overhead. 

5.2.1 The Threshold vs. Storage Tradoff 

In P3, the threshold T is a tunable parameter that trades 
off storage space for privacy: at higher thresholds, fewer 
coefficients are in the secret part but more information 
is exposed in the public part. Figure [5] reports on the 
size of the public part (a JPEG image), the secret part (an 
encrypted JPEG image), and the combined size of the 
two parts, as a fraction of the size of the original image, 
for different threshold values T. One interesting feature 
of this figure is that, despite the differences in size and 
composition of the two data sets, their size distribution 
as a function of thresholds is qualitatively similar. At 
low thresholds (near 1), the combined image sizes exceed 
the original image size by about 20%, with the public and 
secret parts being each about 50% of the total size. While 
this setting provides excellent privacy, the large size of 
the secret part can impact bandwidth savings; recall that, 
in P3, the secret part has to be downloaded in its entirety 
even when the public part has been resized significantly. 
Thus, it is important to select a better operating point 
where the size of the secret part is smaller. 

Fortunately, the shape of the curve of Figure [5] for both 
datasets suggests operating at the knee of the "secret" 
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(a) Public Part 





















(b) Secret Part 

Figure 7: Baseline - Encryption Result (T: 1,5,10,15,20) 

line (at a threshold of in the range of 15-20), where the 
secret part is about 20% of the original image, and the 
total storage overhead is about 5-10%. Figure [Tj which 
depicts the public and secret parts (recall that the secret 
part is also a JPEG image) of a canonical image from the 
USC-SIPI dataset, shows that for thresholds in this range 
minimal visual information is present in the public part, 
with all of it being stored in the secret part. We include 
these images to give readers a visual sense of the efficacy 
of P3; we conduct more detailed privacy evaluations be- 
low. This suggests that a threshold between 10-20 might 
provide a good balance between privacy and storage. We 
solidify this finding below. 

5.2.2 Privacy 

PSNR. One of the earliest objective metrics used for 
evaluating the quality of image reconstruction is the peak 
signal-to-noise ratio (PSNR). In Figure |6j we present av- 
erage PSNRs and standard deviations of the public and 
secret part of the USC-SIPI and the INRIA dataset, as a 
function of different thresholds, when compared to the 
original image. 

The secret parts show high PSNRs, especially when 
we consider the fact that 35-40dB is regarded as per- 
ceptually loseless in the image processing community. 
Nonetheless, note that our encryption algorithm uses a 
single threshold across entire image blocks and does not 
consider block energy distributions. As a result, even 
if we get about 40dB in the secret part, we can identify 
non-trivial block effects when we closely observe the im- 
age (Figure [7]). It is encouraging that the PSNR values 
of the public part are all around 10-15 dB, and that they 
increase only slightly with threshold. The extraction of 
the DC component into the secret part plays a major part 
in leading to such low PSNR values. For the range of 
(low) PSNRs that we observe here (e.g., around 15 dB) it 
is widely accepted that quality is so degraded that these 
images are practically useless. 

However, this alone is not an indication that P3 pre- 
serves privacy; an examination of the public part of 
threshold 100 (not shown) reveals some of the features in 



the original image. At lower thresholds these features are 
no longer visible (Figure [7]), but the difference in PSNR 
between a threshold of 10 and 100 is negligible. 

For this reason, we consider using several other met- 
rics to quantify the privacy obtained with P3. These met- 
rics quantify the efficacy of automated algorithms on the 
public part; each automated algorithm can be considered 
to be mounting a privacy attack on the public part. 

Edge Detection. Edge detection is an elemental process- 
ing step in many signal processing and machine vision 
applications, and attempts to discover discontinuities in 
various image characteristics. We apply the well-known 
Canny edge detector L14J and its implementation | 25 1 to 
the public part of images in the USC-SIPI dataset, and 
present images with the recognized edges in Figure |9] 
For space reasons, we only show edges detected on the 
public part of 4 canonical images for a threshold of 1 
and 20. The images with a threshold 20 do reveal several 
"features", and signal processing researchers, when told 
that these are canonical images from a widely used data 
set, can probably recognize these images. However, a 
layperson who has not seen the image before very likely 
will not be able to recognize any of the objects in the 
images (the interested reader can browse the USC-SIPI 
dataset online to find the originals). We include these 
images to point out that visual privacy is a highly subjec- 
tive notion, and depends upon the beholder's prior expe- 
riences. If true privacy is desired, end-to-end encryption 
must be used. P3 provides "pretty good" privacy together 
with the convenience and performance offered by photo 
sharing services. 

It is also possible to quantify the privacy offered by 



P3 for edge detection attacks. Figure [8(a)| plots the frac- 
tion of matching pixels in the image obtained by running 
edge detection on the public part, and that obtained by 
running edge detection on the original image (the result 
of edge detection is an image with binary pixel values). 
At threshold values below 20, barely 20% of the pixels 
match; at very low thresholds, running edge detection 
on the public part results in a picture resembling white 
noise, so we believe the higher matching rate shown at 
low thresholds simply results from spurious matches. We 
conclude that, for the range of parameters we consider, 
P3 is very robust to edge detection. 

Face Detection. Face detection algorithms detect hu- 
man faces in photos, and were available as part of Face- 
book's face recognition API, until Facebook shut down 
the API |3|. To quantify the performance of face de- 
tection on P3, we use the Haar face detector from the 
OpenCV library | 5 1, and apply it to the public part of im- 
ages from Caltech's face dataset 1 1 1. The efficacy of face 
detection, as a function of different thresholds, is shown 



in Figure 8(b) The y-axis represents the average number 
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Figure 8: Privacy on Detection and Recognition Algorithms 
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of faces detected; it is higher than 1 for the original im- 
ages, because some images have more than one face. P3 
completely foils face detection for thresholds below 20; 
at thresholds higher than about 35, faces are occasionally 
detected in some images. 

SIFT feature extraction. SIFT 1 36] (or Scale-invariant 
Feature Transform) is a general method to detect features 
in images. It is used as a pre-processing step in many im- 
age detection and recognition applications from machine 
vision. The output of these algorithms is a set of fea- 
ture vectors, each of which describes some statistically 



interesting aspect of the image. 

We evaluate the efficacy of attacking P3 by perform- 
ing SIFT feature extraction on the public part. For this, 
we use the implementation |[35ll from the designer of 
SIFT together with the default parameters for feature ex- 



traction and feature comparison. Figure [8(c)| reports the 
results of running feature extraction on the USC-SIPI 
datasetp^This figure shows two lines, one of which mea- 
sures the total number of features detected on the public 
part as a function of threshold. This shows that as the 
threshold increases, predictably, the number of detected 
features increases to match the number of features de- 
tected in the original figure. More interesting is the fact 
that, below the threshold of 10, no SIFT features are de- 
tected, and below a threshold of 20, only about 25% of 
the features are detected. 

However, this latter number is a little misleading, be- 
cause we found that, in general, SIFT detects different 
feature vectors in the public part and the original image. 
If we count the number of features detected in the public 
part, which are less than a distance d (in feature space) 
from the nearest feature in the original image (indicating 
that, plausibly, SIFT may have found, in the public part, 
of feature in the original image), we find that this number 
is far smaller; up to a threshold of 35, a very small frac- 
tion of original features are discovered, and even at the 
threshold of 100, only about 4% of the original features 
have been discovered. We use the default parameter for 
the distance d in the SIFT implementation; changing the 
parameter does not change our conclusions 

Face Recognition. Face recognition algorithms take an 
aligned and normalized face image as input and match it 



The SIFT algorithm is computationally expensive, and the INRIA 
data set is large, so we do not have the results for the INRIA dataset. 
(Recall that we need to compute for a large number of threshold val- 
ues). We expect the results to be qualitatively similar. 

^^Our results use a distance parameter of 0.6 from f35l; we used 
0.8, the highest distance parameter that seems to be meaningful ( 1361, 
Figure 11) and the results are similar. 
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against a database of faces. They return the best possi- 
ble answer, e.g., the closest match or an ordered list of 
matches, from the database. We use the Eigenface ISOl 
algorithm and a well-known face recognition evaluation 
system 1 13 1 with the Color FERET database. On Eigen- 
Face, we apply two distance metrics, the Euclidean and 
the Mahalinobis Cosine {T2\ , for our evaluation. 

We examine two settings: Normal-Public setting con- 
siders the case in which training is performed on normal 
training images in the database and testing is executed 
on public parts. The Public-Public setting trains the 
database using public parts of the training images; this 
setting is a stronger attack on P3 than Normal-Public. 



Figure |8(d)| shows a subset of our results, based on 
the Mahalinobis Cosine distance metric and using the 
FAFB probing set in the FERET database. To quantify 
the recognition performance, we follow the methodol- 
ogy proposed by |41, 42 1. In this graph, a data point 
at (x, means that y% of the time, the correct answer 
is contained in the top x answers returned by the Eigen- 
Face algorithm. In the absence of P3 (represented by the 
Normal-Normal line), the recognition accuracy is over 
80%. 

If we consider the proposed range of operating thresh- 
olds (T=l-20), the recognition rate is below 20% at rank 
1. Put another way, for these thresholds, more than 80% 
of the time, the face recognition algorithm provides the 
wrong answer (a false positive). Moreover, our maxi- 
mum threshold (T=20) shows about a 45% rate at rank 
50, meaning that less than half the time the correct an- 
swer lies in the top 50 matches returned by the algorithm. 
We also examined other settings, e.g., Euclidean distance 
and other probing sets, and the results were qualitatively 
similar. These recognition rates are so low that a face 
recognition attack on P3 is unlikely to succeed; even if 
an attacker were to apply face recognition on P3, and 
even if the algorithm happens to be correct 20% of the 
time, the attacker may not be able to distinguish between 
a true positive and a false positive since the public image 
contains little visual information. 

5.3 What is Lost? 

P3 achieves privacy but at some cost to reconstruction 
accuracy, as well as bandwidth and processing overhead. 

Reconstruction Accuracy. As discussed in Section |3j 
the reconstruction of an image for which a linear trans- 
formation has been applied should, in theory, be perfect. 
In practice, however, quantization effects in JPEG com- 
pression can introduce very small errors in reconstruc- 
tion. Most images in the USC-SIPI dataset can be re- 
constructed, when the transformations are known a pri- 
ori, with an average PSNR of 49.2dB. In the signal pro- 
cessing community, this would be considered practically 
lossless. More interesting is the efficacy of our recon- 



struction of Facebook and Flickr's transformations. In 
Section |4j we described an exhaustive parameter search 
space methodology to approximately reverse engineer 
Facebook and Flickr's transformations. Our methodol- 
ogy is fairly successful, resulting in images with PSNR 
of 34.4dB for Facebook and 39.8dB for Flickr. To an 
untrained eye, images with such PSNR values are gener- 
ally blemish-free. Thus, using P3 does not significantly 
degrade the accuracy of the reconstructed images. 

Bandwidth usage cost. In P3, suppose a recipient down- 
loads, from a PSP, a resized version of an uploaded im- 
agj^ The total bandwidth usage for this download is the 
size of the resized public part, together with the complete 
secret part. Without P3, the recipient only downloads the 
resized version of the original image. In general, the for- 
mer is larger than the latter and the difference between 
the two represents the bandwidth usage cost, an impor- 
tant consideration for usage-metered mobile data plans. 
This cost, as a function of the P3 threshold, is shown 



in Figure [T0| for the INRIA dataset (the USC dataset re- 
sults are similar). For thresholds in the 10-20 range, this 
cost is modest: 20KB or less across different resolutions 
(these resolutions are the ones Facebook statically re- 
sizes an uploaded image to). As an aside, the variabil- 
ity in bandwidth usage cost represents an opportunity: 
users who are more privacy conscious can choose lower 
thresholds at the expense of slightly higher bandwidth 
usage. Finally, we observe that this additional bandwidth 
usage can be reduced by trading off storage: a sender 
can upload multiple encrypted secret parts, one for each 
known static transformation that a PSP performs. We 
have not implemented this optimization. 

Processing Costs. On a Galaxy S3 smartphone, for a 
720x720 image (the largest resolution served by Face- 
book), it takes on average 152 ms to extract the public 
and secret parts, about 55 ms to encrypt/decrypt the se- 
cret part, and 191 ms to reconstruct the image. These 
costs are modest, and unlikely to impact user experience. 

6 Related Work 

We do not know of prior work that has attempted to ad- 
dress photo privacy for photo-sharing services. Our work 
is most closely related to work in the signal processing 
community on image and video privacy. Early efforts 
at image privacy introduced techniques like region-of- 
interest masking, blurring, or pixellation [171 . In these 
approaches, typically a face or a person in an image is 
represented by a blurred or pixelated version; as |Tf| 
shows, these approaches are not particularly effective 
against algorithmic attacks like face recognition. A sub- 
sequent generation of approaches attempted to ensure 



^^In our experiments, we mimic PSP resizing using ImageMagick's 
convert program [TJJ 
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privacy for surveillance by scrambling coefficients in a 
manner qualitatively similar to P3's algorithm [TT', 18], 
e.g., some of them randomly flips the sign information. 
However, this line of work has not explored designs un- 
der the constraints imposed by our problem, namely the 
need for JPEG-compliant images at PSPs to ensure stor- 
age and bandwidth benefits, and the associated require- 
ment for relatively small secret parts. 

This strand is part of a larger body of work on selective 
encryption in the image processing conmiunity. This re- 
search, much of it conducted in the 90s and early 2000s, 
was motivated by ensuring image secrecy while reducing 
the computation cost of encryption |[38l|34]|. This line of 
work has explored some of the techniques we use such as 
extracting the DC components [48] and encrypting the 
sign of the coefficient (471 1121 ^ as well as techniques 
we have not, such as randomly permuting the coeffi- 
cients 1 48 , 45 1 . Relative to this body of work, P3 is novel 
in being a selective encryption scheme tailored towards 
a novel set of requirements, motivated by photo sharing 
services. In particular, to our knowledge, prior work has 
not explored selective encryption schemes which permit 
image reconstruction when the unencrypted part of the 
image has been subjected to transformations like resizing 
or cropping. Finally, a pending patent application by one 
of the co-authors | 40 1 of this paper, includes the idea of 
separating an image into two parts, but does not propose 
the P3 algorithm, nor does it consider the reconstruction 
challenges described in Section [3] 

Some recent papers have examined complementary 
image security and privacy problems. Johnson et al. dis- 
cuss homomorphic encryption based methods for ver- 
ifying image signatures when images have been sub- 
ject to transformations like cropping, scaling, and JPEG- 
like compression |3T| . End-to-end image encryption has 
been explored for the JPEG 2000 image format (201, 
and has resulted in a standard for JPEG 2000 imaging 
(JPSEC) f30l. 

Tangentially related is a body of work in the computer 
systems community on ensuring other forms of privacy: 
secure distributed storage systems I'SIEHHI. and pri- 
vacy and anonymity for mobile systems |[T9l |26l [T6l . 
None of these techniques directly apply to our setting. 

7 Conclusions 

P3 is a privacy preserving photo sharing scheme that 
leverages the sparsity and quality of images to store most 
of the information in an image in a secret part, leaving 
most of the volume of the image in a JPEG-compliant 
public part, which is uploaded to PSPs. P3's public parts 
have very low PSNRs and are robust to edge detection, 
face detection, or sift feature extraction attacks. These 
benefits come at minimal costs to reconstruction accu- 



racy, bandwidth usage and processing overhead. 
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